Strategies for Training Large Vocabulary Neural Language Models

نویسندگان

  • Wenlin Chen
  • David Grangier
  • Michael Auli
چکیده

Training neural network language models over large vocabularies is computationally costly compared to count-based models such as Kneser-Ney. We present a systematic comparison of neural strategies to represent and train large vocabularies, including softmax, hierarchical softmax, target sampling, noise contrastive estimation and self normalization. We extend self normalization to be a proper estimator of likelihood and introduce an efficient variant of softmax. We evaluate each method on three popular benchmarks, examining performance on rare words, the speed/accuracy trade-off and complementarity to Kneser-Ney.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Vocabulary SOUL Neural Network Language Models

This paper presents continuation of research on Structured OUtput Layer Neural Network language models (SOUL NNLM) for automatic speech recognition. As SOUL NNLMs allow estimating probabilities for all in-vocabulary words and not only for those pertaining to a limited shortlist, we investigate its performance on a large-vocabulary task. Significant improvements both in perplexity and word error...

متن کامل

Using Random Forest Language Cts System

One of the challenges in large vocabulary speech recognition is the availability of large amounts of data for training language models. In most state-of-the-art speech recognition systems, -gram models with Kneser-Ney smoothing still prevail due to their simplicity and effectiveness. In this paper, we study the performance of a new language model, the random forest language model, in the IBM co...

متن کامل

CSLM - a modular open-source continuous space language modeling toolkit

Language models play a very important role in many natural language processing applications, in particular large vocabulary speech recognition and statistical machine translation. For a long time, back-off n-gram language models were considered to be the state-of-art when large amounts of training data are available. Recently, so called continuous space methods or neural network language models...

متن کامل

Multi-Language Neural Network Language Models

In recent years there has been considerable interest in neural network based language models. These models typically consist of vocabulary dependent input and output layers and one, or more, hidden layers. A standard problem with these networks is that large quantities of training data are needed to robustly estimate the model parameters. This poses a challenge when only limited data is availab...

متن کامل

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (BNCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1512.04906  شماره 

صفحات  -

تاریخ انتشار 2016